Expressing Arbitrary Reward Functions as Potential-Based Advice
نویسندگان
چکیده
Effectively incorporating external advice is an important problem in reinforcement learning, especially as it moves into the real world. Potential-based reward shaping is a way to provide the agent with a specific form of additional reward, with the guarantee of policy invariance. In this work we give a novel way to incorporate an arbitrary reward function with the same guarantee, by implicitly translating it into the specific form of dynamic advice potentials, which are maintained as an auxiliary value function learnt at the same time. We show that advice provided in this way captures the input reward function in expectation, and demonstrate its efficacy empirically.
منابع مشابه
Principled Methods for Advising Reinforcement Learning Agents
An important issue in reinforcement learning is how to incorporate expert knowledge in a principled manner, especially as we scale up to real-world tasks. In this paper, we present a method for incorporating arbitrary advice into the reward structure of a reinforcement learning agent without altering the optimal policy. This method extends the potentialbased shaping method proposed by Ng et al....
متن کاملCOVARIANCE MATRIX OF MULTIVARIATE REWARD PROCESSES WITH NONLINEAR REWARD FUNCTIONS
Multivariate reward processes with reward functions of constant rates, defined on a semi-Markov process, first were studied by Masuda and Sumita, 1991. Reward processes with nonlinear reward functions were introduced in Soltani, 1996. In this work we study a multivariate process , , where are reward processes with nonlinear reward functions respectively. The Laplace transform of the covar...
متن کاملAdvice Generation from Observed Execution: Abstract Markov Decision Process Learning
An advising agent, a coach, provides advice to other agents about how to act. In this paper we contribute an advice generation method using observations of agents acting in an environment. Given an abstract state definition and partially specified abstract actions, the algorithm extracts a Markov Chain, infers a Markov Decision Process, and then solves the MDP (given an arbitrary reward signal)...
متن کاملAnalytical Solution for Two-Dimensional Coupled Thermoelastodynamics in a Cylinder
An infinitely long hollow cylinder containing isotropic linear elastic material is considered under the effect of arbitrary boundary stress and thermal condition. The two-dimensional coupled thermoelastodynamic PDEs are specified based on equations of motion and energy equation, which are uncoupled using Nowacki potential functions. The Laplace integral transform and Bessel-Fourier series are u...
متن کامل